Tiling Multidimensional Iteration Spaces for Multicomputers

نویسندگان

J. Ramanujam

P. Sadayappan

چکیده

This paper addresses the problem of compiling perfectly nested loops for multicomputers (distributed memory machines). The relatively high communication startup costs in these machines renders frequent communication very expensive. Motivated by this, we present a method of aggregating a number of loop iterations into tiles where the tiles execute atomically – a processor executing the iterations belonging to a tile receives all the data it needs before executing any one of the iterations in the tile, executes all the iterations in the tile and then sends the data needed by other processors. Since synchronization is not allowed during the execution of a tile, partitioning the iteration space into tiles must not result in deadlock. We first show the equivalence between the problem of finding partitions and the problem of determining the cone for a given set of dependence vectors. We then present an approach to partitioning the iteration space into deadlock-free tiles so that communication volume is minimized. In addition, we discuss a method to optimize the size of tiles for nested loops for multicomputers. This work differs from other approaches to tiling in that we present a method of optimizing grain size of tiles for multicomputers.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Tiling Multidimensional Itertion Spaces for Multicomputers

متن کامل

Automatic Partitioning of Parallel Loops and Data Arrays for Distributed Shared-Memory Multiprocessors

This paper presents a theoretical framework for automatically partitioning parallel loops to minimize cache coherency tra c on shared-memory multiprocessors. While several previous papers have looked at hyperplane partitioning of iteration spaces to reduce communication tra c, the problem of deriving the optimal tiling parameters for minimal communication in loops with general a ne index expres...

متن کامل

Reducing Data Communication Overhead for Doacross Loop Nests Reducing Data Communication Overhead for Doacross Loop Nests

If the loop iterations of a loop nest cannot be partitioned into independent sets, the data communication for data dependences are inevitable in order to execute them on parallel machines. This kind of loop nests are referred to as Doacross loop nests. This paper is concerned with compiler algorithms for parallelizing Doacross loop nests for distributed-memory multicomputers. We present a metho...

متن کامل

Tiling of Iteration Spaces for Multicomputers

We deal with compiler support for parallelizing perfectly nested loops for coarse-grain distributed memory machines. The relatively high communication start-up costs in these machines renders frequent communication very expensive. We study the eeect of clustering communication and the ensuing loss of parallelism on performance and propose a method for aggregating a number of loop iterations int...

متن کامل